Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add allow_partial #1512

Merged
merged 21 commits into from
Nov 4, 2024
Merged

Add allow_partial #1512

merged 21 commits into from
Nov 4, 2024

Conversation

samuelcolvin
Copy link
Member

@samuelcolvin samuelcolvin commented Oct 30, 2024

See pydantic/pydantic#10748 for details.

Selected Reviewer: @sydney-runkle

Copy link

codspeed-hq bot commented Oct 30, 2024

CodSpeed Performance Report

Merging #1512 will degrade performances by 16.62%

Comparing allow_partial (b1226d5) with main (5e95c05)

Summary

❌ 1 (👁 1) regressions
✅ 154 untouched benchmarks

Benchmarks breakdown

Benchmark main allow_partial Change
👁 test_frozenset_of_ints_duplicates_core 144.2 µs 173 µs -16.62%

Copy link
Member

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly minimal changes here given the new functionality added - cool!

I'm a bit confused about the naming around enumerate_last_partial - perhaps we could bikeshed just a bit here to make the purpose of this function more clear?

python/pydantic_core/_pydantic_core.pyi Outdated Show resolved Hide resolved
from pydantic_core import SchemaValidator, ValidationError, core_schema


def test_list():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just parametrize here and have a test for validation fails and validation successes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same questions below)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, tests lgtm

@sydney-runkle
Copy link
Member

Curious to hear what @davidhewitt thinks about the way this is implemented - overall, looks good to me, I think having things centralized around the allow_partial in the state makes sense, and having a supports_partial feature on each validator (that defaults to false) seems pretty intuitive as well.

It's interesting to me the ways in which fail_fast and allow_partial are sometimes tied together. Perhaps adding documentation in a similar section would be helpful for users.

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall functionally looks like it's heading in the right direction, but I think there can be some simplifications. I also have a bunch of questions which hint at edge cases.

python/pydantic_core/_pydantic_core.pyi Outdated Show resolved Hide resolved
src/input/input_abstract.rs Show resolved Hide resolved
Comment on lines 829 to 830
Self::Dict(dict) => dict.keys().iter().last(),
Self::Mapping(mapping) => mapping.keys().ok()?.iter().ok()?.last()?.ok(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB both of these are potentially inefficient; .keys() creates a new PyList of all the keys.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I don't love it, but at least it's only called when allow_partial=true.

Is there a more efficient way? (especially for dicts?)

Long term, if we move to RustModel and iterating over dicts when building typeddicts, we should be able to minimise use of this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.call_method0("keys").iter().last() might be better (Python keys iterator rather than list), but I think it still sucks. Can't get better until we iterate, as you say.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested by adding:

#[pyfunction]
pub fn mapping_last_key_a<'a>(mapping: &'a Bound<'a, PyMapping>) -> Option<Bound<'a, PyAny>> {
    mapping.keys().ok()?.iter().ok()?.last()?.ok()
}

#[pyfunction]
pub fn mapping_last_key_b<'a>(mapping: &'a Bound<'a, PyMapping>) -> Option<Bound<'a, PyAny>> {
    mapping.call_method0(intern!(mapping.py(), "keys")).ok()?.iter().ok()?.last()?.ok()
}

and tested with

from typing import Mapping
import timeit

from pydantic_core import _pydantic_core


class MyMapping(Mapping):
    def __init__(self, d):
        self._d = d

    def __getitem__(self, key):
        return self._d[key]

    def __iter__(self):
        return iter(self._d)

    def __len__(self):
        return len(self._d)


mapping = MyMapping({str(i): i for i in range(100)})

v = _pydantic_core.mapping_last_key_a(mapping)
assert v == '99', v
v = _pydantic_core.mapping_last_key_b(mapping)
assert v == '99', v


def run_bench(func):
    timer = timeit.Timer(
        "func(mapping)", setup="", globals={"func": func, "mapping": mapping}
    )
    n, t = timer.autorange()
    iter_time = t / n
    # print(f'{func.__module__}.{func.__name__}', iter_time)
    return int(iter_time * 1_000_000_000)


print(f'mapping_last_key_a: {run_bench(_pydantic_core.mapping_last_key_a)}ns')
print(f'mapping_last_key_b: {run_bench(_pydantic_core.mapping_last_key_b)}ns')

Output:

mapping_last_key_a: 2487ns
mapping_last_key_b: 1918ns

Outcome: they're both pretty slow, but your suggestion is slightly faster.

Comment on lines 59 to 68
pub fn enumerate_last_partial<'i, I>(
&self,
iter: impl Iterator<Item = I> + 'i,
) -> Box<dyn Iterator<Item = (usize, bool, I)> + 'i> {
if self.allow_partial {
Box::new(EnumerateLastPartial::new(iter))
} else {
Box::new(iter.enumerate().map(|(i, x)| (i, false, x)))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than forcing dynamic dispatch here, it might be better to use .peekable() at the callsites to be able to check if the current iteration was the last as part of the error pathway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peekable made things worse, but I got rid of the Box dyn in 47f1c15

and it helped quite a lot locally - comparing main to 47f1c15:

┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃  Group      ┃  Benchmark                          ┃  Before (µs/iter)  ┃  After (µs/iter)  ┃  Change  ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│  List       │  list_of_ints_core_py               │             29.17  │            30.09  │  +3.17%  │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│  List JSON  │  list_of_ints_core_json             │             42.49  │            43.57  │  +2.55%  │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│  Set        │  set_of_ints_core                   │             52.26  │            52.81  │  +1.06%  │
│             │  set_of_ints_core_duplicates        │             35.59  │            35.51  │  -0.22%  │
│             │  set_of_ints_core_length            │             53.59  │            55.16  │  +2.94%  │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│  Set JSON   │  set_of_ints_core_json              │             56.31  │            58.40  │  +3.70%  │
│             │  set_of_ints_core_json_duplicates   │             44.14  │            45.07  │  +2.12%  │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│  FrozenSet  │  frozenset_of_ints_core             │             16.60  │            17.26  │  +3.99%  │
│             │  frozenset_of_ints_duplicates_core  │             13.78  │            14.78  │  +7.20%  │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│  Dict       │  dict_of_ints_core                  │             85.74  │            86.36  │  +0.73%  │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│  Dict JSON  │  dict_of_ints_core_json             │             127.4  │            126.5  │  -0.72%  │
└─────────────┴─────────────────────────────────────┴────────────────────┴───────────────────┴──────────┘



def test_tuple_list():
"""Tuples don't support partial, so behaviour should be disabled."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? At least for variadic tuples this seems potentially acceptable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely want to support it in future, I was just implementing it for the minimum set of validators to prove the idea initially.

tests/emscripten_runner.js Show resolved Hide resolved
for (index, item_result) in iter.enumerate() {

for (index, is_last_partial, item_result) in state.enumerate_last_partial(iter) {
state.allow_partial = is_last_partial && validator.supports_partial();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the .supports_partial() guard necessary (here and in general)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because if you enabled allow_partial for a validator which doesn't have proper support for the feature, you would end up skipping errors in the wrong places.

E.g. if you had list[tuple[list[int], list[str]]:

  • the outer list validator supports allow_partial, so it'll ignore errors in the last entry in the input list
  • but if it passed allow_partial=true down to the tuple validator (which doesn't current support) allow_partial
  • it would pass state through to both inner list validators unchanged
  • then the list validator associated with the 0th entry in the tuple would get allow_partial=true
  • and therefore errors in the last entry of the 0th member of the tuple in the last entry of the outer list would be ignored when they shouldn't be

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels to me like the "doesn't support partial" is the exception rather than the rule, and probably only for non-variadic tuples? Otherwise, I would have thought that:

  • collections would always want to respect allow_partial when passed in (forwarding it to their last element)
  • everything else e.g. with_default etc should not even care about allow_partial and just forward it naively

src/validators/dict.rs Outdated Show resolved Hide resolved
Err(ValError::LineErrors(line_errors)) => {
for err in line_errors {
errors.push(err.with_outer_location(key.clone()));
if !is_last_partial {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'm feeling very baited to refactor these error combiners now 😂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #1517 as a spike to start down that road.

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely want to support it in future, I was just implementing it for the minimum set of validators to prove the idea initially.

I think we need it to be approximately right for most validators to actually be worth inviting users to try, and I'm not sure it is at the moment.

Comment on lines 829 to 830
Self::Dict(dict) => dict.keys().iter().last(),
Self::Mapping(mapping) => mapping.keys().ok()?.iter().ok()?.last()?.ok(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.call_method0("keys").iter().last() might be better (Python keys iterator rather than list), but I think it still sucks. Can't get better until we iterate, as you say.

for (index, item_result) in iter.enumerate() {

for (index, is_last_partial, item_result) in state.enumerate_last_partial(iter) {
state.allow_partial = is_last_partial && validator.supports_partial();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels to me like the "doesn't support partial" is the exception rather than the rule, and probably only for non-variadic tuples? Otherwise, I would have thought that:

  • collections would always want to respect allow_partial when passed in (forwarding it to their last element)
  • everything else e.g. with_default etc should not even care about allow_partial and just forward it naively

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like there's very common use cases missing from here:

  • unions
  • nullable
  • defaults

Copy link
Member Author

@samuelcolvin samuelcolvin Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think validators fit into three cases:

  • already done
  • doesn't apply - things like IntValidator where allow_partial doesn't mean anything
  • TODO simple cases - things like nullable where AFAIK it's just a matter of passing the allow_partial instruction to its nested validator - that's my understanding of these but maybe it's more complicated
  • TODO - collections or multiple nested validators where we need custom logic

I think validators break down like this:

Validator Status
TypedDictValidator DONE
UnionValidator TODO simple case?
TaggedUnionValidator TODO simple case
NullableValidator TODO simple case
ModelValidator doesn't apply
ModelFieldsValidator TODO
DataclassArgsValidator TODO
DataclassValidator doesn't apply
StrValidator doesn't apply
StrConstrainedValidator doesn't apply
IntValidator doesn't apply
ConstrainedIntValidator doesn't apply
BoolValidator doesn't apply
FloatValidator doesn't apply
ConstrainedFloatValidator doesn't apply
DecimalValidator doesn't apply
ListValidator DONE
SetValidator DONE
TupleValidator TODO
DictValidator DONE
NoneValidator doesn't apply
FunctionBeforeValidator TODO simple case?
FunctionAfterValidator TODO simple case?
FunctionPlainValidator TODO simple case?
FunctionWrapValidator TODO simple case?
CallValidator TODO simple case?
LiteralValidator doesn't apply
IntEnumValidator doesn't apply
StrEnumValidator doesn't apply
FloatEnumValidator doesn't apply
PlainEnumValidator doesn't apply
AnyValidator doesn't apply
BytesValidator doesn't apply
BytesConstrainedValidator doesn't apply
DateValidator doesn't apply
TimeValidator doesn't apply
DateTimeValidator doesn't apply
FrozenSetValidator DONE
TimeDeltaValidator doesn't apply
IsInstanceValidator doesn't apply
IsSubclassValidator doesn't apply
CallableValidator doesn't apply
ArgumentsValidator TODO
WithDefaultValidator TODO simple case
ChainValidator TODO simple case?
LaxOrStrictValidator TODO simple case?
GeneratorValidator TODO
CustomErrorValidator TODO simple case?
JsonValidator TODO
UrlValidator doesn't apply
MultiHostUrlValidator doesn't apply
UuidValidator doesn't apply
DefinitionRefValidator TODO
JsonOrPython TODO simple case?
ComplexValidator doesn't apply

If that table is correct, we should set the default for supports_partial to true, then set it to false specifically for fields which don't support it. But I'm still not that confident I'm write about all those cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got rid of supports_partial completely, and instead set state.allow_partial = false on the few validators which don't support it yet.

@samuelcolvin
Copy link
Member Author

please review

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now, I think this is the right default and makes it explicit where we have decided not to support yet 👍

src/serializers/fields.rs Outdated Show resolved Hide resolved
@samuelcolvin samuelcolvin enabled auto-merge (squash) November 4, 2024 10:51
@samuelcolvin samuelcolvin merged commit a1fa596 into main Nov 4, 2024
28 checks passed
@samuelcolvin samuelcolvin deleted the allow_partial branch November 4, 2024 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants